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The titles in the bibliography for the three year period must in 
the main speak for themselves. Limitations of space preclude a 
detailed analysis or review and only certain major tendencies and a 
few studies can be singled out for special mention. A grouping of 
the materials may be of some guidance to those who examine the 
bibliography with a general interest and to those whose interest 
centers about special fields of study. 


1. General Texts 

Texts for use in classes in educational psychology are represented 
by Averill (7), Gates (71), Pillsbury (174), and Taylor (210). 
Laboratory manuals for experiments in educational psychology 
melude Peterson (172), Pyle (181), and Turner and Betts (222). 
Texts on psychological principles as applied to teaching are offered 
by Mead (146) and Pyle (178). 


2. Learning 


Apart from additional studies upon many familiar piecemeal 
problems two general tendencies are evident in the study of learning 
processes : 

(a) A definite attempt to get at the fundamental nature of intel- 
ligence and learning. Thurstone (218,219) emphasizes functional 
adjustments; Thomson (214, 215), many factors; Terman (211), 
nature vs. nurture; Pyle (177,180), the physiological basis; 
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Dashiell (45,46), Herring (97), and Myers (156), behavioristic 
interpretations ; Koffka (127), Humphrey (102), and Ogden (161), 
the psychology of Gestalt. 

(b) The importance of social forces and social institutions for 
learning and education. General treatises on social psychology by 
Allport (2), Bernard (17), Dunlap (52), and Barnes (15) stress 
social implications in learning. This is notably true in Judd (114 
115, 116, 117, 118) and in the studies by the University of Chicago 
High School Faculty (55,98). It appears also in Griffith (89) and 
in Chapman and Counts (38). 

Among special studies dealing with factors and conditions affect- 
ing the learning process may be noted those dealing with: 

(a) Motivation and incentives by Hurlock (103, 104), Knight 
and Remmers (125), Myers (157), Ruediger (197), Rusch (198), 
Whittemore (235), and Worcester (239). 

(b) Whole vs. part methods in learning by Brown (30), 
Reed (183), and Winch (236). 

(c)Transfer of training by Brooks (27), Gates (68), Knight 
and Selzafandt (123), and Thorndike (217). 

(d) The conditioning of emotional responses by Jones (108, 


109, 110,111), and Watson (226, 227, 228). 


3. The Exceptional Child 

To the psychology and education of exceptional children 
numerous contributions have been made. 

(a) The superior child. The important contribution by Terman 
and others (219, 220) and the special studies by Aull (6), Bixby (19), 
Freeman (60,61), French (64) , Furfey (65), Hollingworth (99, 
100), Jones (112,113), Lee (132), Pechstein (169), Richards- 
Nash (185), and Woolley (237), indicate the interest in this 
important problem. 

(b) The Problem Child. Hollingworth (100), Morgan (150), 
and Wallin (225) give valuable surveys and there are numerous 
special studies. 


4. The Preschool Child 

A special characteristic of this three-year period is the interest in 
the preschool child. Important books are Baldwin and Stecher (10), 
Drummond (51), Gesell (77,78), and Martin (145); special con- 
tributions in this field are by Andrus (5), Baldwin (11), 
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Douglass (49), Gates (43), Moss (152), Jones (108, 109, 110, 111), 
and Watson (226, 227, 228). 


5. Psychology of School Subjects 





Studies in curriculum making—what to learn—are very numer- 
ous but are not listed in this bibliography. Studies of factors and 
conditions in the learning of school subjects—how to learn most 
economically and effectively—are less numerous. General discus- 
sions of the psychology of school subjects are found in Edwards (54), 
Freeman (63), and LaRue (130). The psychology of reading 1s 
represented by Gates and Van Alstyne (67,74), Gates (75), Grant 
and White (82,83), Gray (84,85), Green (87), Gregory (90), 
Lowry (1385), McCall and Crabbs (137), Pennell and Cusack (170), 
Scott (202), Springsteed (206), Zirbes (240). The special topic of 
phonics is studied by Currier (44), Dougherty (48), and Vogel, Jay 
and Washburne (223). This is a small list of investigations of an 
experimental character for a three year period and not all of those 
listed are studies in learning to read. Contributions to the psychology 
of spelling are found in Breed (23,24), Kingsley (122), Tidyman 
and Johnson (220), and Tidyman (221). 

Studies in the psychology of arithmetic are summarized by 
Brown (28). Analysis of the difficulty of various number con- 
tributions are given by Clapp (40), Balson and Combelleck (12) ; 
the effect of practice by Greene (86), Knight (124), Osburn (163) ; 
frequency, persistence and types of errors by Morton (151), 
Myers (158); methods of learning and teaching by Barber (13), 
Buckingham (31), Knight, Ruch and Lutes (126). 

\ special interest is manifest in the development of new type 
examinations. Books by Paterson (168), and Ruch (195), meet a 
definite need. Special studies by Bardy (14), Brinkley (25), 
Farwell (56), Hughes (101), Remmers, Marchat, Brown and 
Chapman (184), Ruch and Stoddard (194), indicate the interest im 
this problem. 

Special mention should be made of the studies in visual education 
by Freeman and others (59). 
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INTELLIGENCE TESTS 
BY RUDOLF PINTNER 


Teachers College, Columbia University 


The past few years have witnessed continued activity in the 
construction and use of intelligence tests as well as in the discussion 
of the theoretical issues involved. Intelligence tests are no longer a 
novelty. They are now used as a matter of course in countless 
schools, institutions and business houses, and much of the undesirable 
publicity occasioned by their widespread use in the army has now 
died down. They are beginning to take their rightful place in 
psychology and in everyday life. A book covering the whole field 
has been written by Pintner (124). 

History. Although intelligence tests seem very recent, they now 
have a decided history. A very complete account is given by Peter- 
son (118) of the work done up to the death of Binet in 1911. This 
book contains the best discussion and summary in English of Binet’s 
numerous researches. Other shorter historical surveys have been 
written by Young (184) and Pintner (124), and Burt has written 
the historical chapter in the English Consultative Committee’s Report 
on Psychological Tests (181). Cattell (22) in a general article refers 
more particularly to the early work in this country. 

Theoretical Considerations. General descriptions of the nature 
of intelligence have been contributed by Thomson (157), Colvin (28) 
and Thorndike (161). Terman (151) argues that the mental test 
is a method of psychological experimentation and that there is no 
difference between tests and experiments in psychology. Two elabo- 
rate and detailed discussions of the nature of intelligence are found 
in the books of Spearman (140) and of Thurstone (164). These 
are the two most elaborate treatments up to the present time. 
Herring (71) attempts to synthesize the numerous definitions of 
intelligence. 

lhorndike (163) broadens the description of intelligence by sug- 
gesting the aspects of level, range, speed and method. The relation 
of these aspects to each other is studied by Clark (25), Bailor (6) 
and Hunsicker (77). 

That there may be different kinds of intelligence such as concrete 


366 











the 
S10n 
ra 
less 
ible 
OW 


eld 


Ww 
°r- 





INTELLIGENCE TESTS 367 


and abstract, verbal and non-verbal is brought out by Herring (70) 


is further discussed in the practical work of Walters (172) and 


of Dashiell and Glenn (33). 


Terman (153) and Whipple (173) discuss the problems of the 
effects of training on intelligence ratings, believing in the possibility 
of measuring native intelligence apart from training. Terman gives 
suggestive data from bright children showing no correlation between 
length of school attendance and achievement in the ordinary school 

ts. Gordon (52) on the other hand maintains that the L.Q. 


; 


is decidedly influenced by lack of schooling in stud) 


ving children 
whose school attendance has been very irregular. And Woolley (182) 
; that 63 per cent of her nursery school children show an increas¢ 

in I1.Q. as compared with only 33 per cent of the non-nursery school 
group. Woolley and Ferris (181) also show an increase 1n I.Q. ot 
hildren when put in an observation class, but this increase 
would seem to be temporary. How much of the I.Q. 1s attributable 
hooling is, therefore, still debatable, and Holzinger and Free 

74) take Burt to task for his well known interpretation oi 


ression equation, for, they say, “to interpret regression equa 


s representing parts or portions of the independent variables 

ranted.” The more immediate effects of practice with tests 

ining on tests is studied by Graves (53), Thorndike (158), 
Odell (114) and Bishop (14), while Gates and Taylor (48), though 
! ling with intelligence tests, find that training has no permanent 


upon memory for digits. The difference between the temporary 
ermanent effects of training or practice is evidently of im- 
e in the general theory of intelligence. 
Many studies deal with the growth of intelligence in order to 
ut at what age it stops. Ballard (10) finds no growth after 
sixteen; Teagarden (149) finds none after eighteen; Hopkins (74 
erowth from fourteen to fifteen with Continuation School 
children, while Thorndike (160) finds evidence of growth in High 
School children in grades 9 to 11. Hart (63) finds little growth after 
hirteen and a stoppage at sixteen or seventeen in a representative 
sampling. The general mental growth of younger children is studied 
Johnson (80). 
horndike and Bregman (162) show a normal distribution of 
telligence derived from nine different tests, and Symonds (148) 
discusses the general distribution of intelligence of the whole 
population, 
Chere are now many studies of re-tests of children either with 






















































































368 RUDOLF PINTNER 
the Binet or with group tests. The following articles deal with the 
problem of the constancy of the I.Q.: Baldwin (8), Berry (12), 
Madsen (95), Garrison and Robinson (45), Gray and Marsden (55 
and 56), and Rugg (136). Most of the coefficients of correlation 
range between 80 and 95. Pintner (123) gives correlations between 
tests repeated after an interval of four years. 

The difference between I1.Q.s obtained from different tests is 
discussed by Miller (100). Jordan (83) discusses the validity of 
intelligence tests and McCall (97) the criteria of a test. The quo- 
tient method of reporting results is criticized by Rand (130). 
Thurstone (165) and Kelley (86) take up various principles and 
techniques in mental measurement. 

Scales. Many new intelligence scales or revisions of previous 
intelligence scales for the individual measurement of intelligence have 
appeared. There has been a decided activity in two directions: 
(1) construction of tests for pre-school children; (2) construction 
or revision of performance scales. 

Two sets of tests for pre-school children have appeared, namely 
Gesell’s (51) tests for infants and pre-school children, and .Baldwin 
and Stecher’s(9) scale for children from two to six. 

Gaw (50) constructs a scale of fourteen performance tests mainly 
from the Pintner—Paterson series. Arthur (2) does likewise with 
eight tests and uses a point scale method of scoring. Kohs (88) 
has standardized the block design test. Shakow and Kent (139) 
have constructed a new series of performance tests that is compact 
and portable for use in clinical work. Stenquist (142) has pub- 
lished the procedure and standardization of his well known mechani- 
cal tests, both written and performance. Easby-Grave (39) gives 
norms for some performance tests at the six-year-old level. Work 
with performance tests has been reported by Gaw (49), who finds 
performance tests less influenced by environment and lack of school- 
ing than is the Binet. Murdoch (112) studies the comparative value 
of nine performance tests, while Lowe (93) studies the Healy Con- 
struction Tests A and B. 

Four new revisions of the Binet have recently appeared. Her- 
ring (67) has constructed a new revision to serve as an alternative 
scale for the Stanford and he (68 and 69) claims a correlation of 
.98 between the two scales. Avery (5) and Wilner (175) also report 
correlations between these two scales. Yerkes and Foster (183) 
have revised the original point scale revision of the Binet. Hayes 
and Irwin (64) have adapted the Stanford Revision for use with 
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the blind. Phillips (119) has made an Australian revision and exten- 
sion of the Binet Scale. Herderschee (66) has constructed an age 
scale for the deaf using a few of the Binet tests but introducing a 
number of performance materials. 

Group Tests. A new group test, claiming high reliability and 
validity, has been published by McCall (98) under the name of the 
Multi-Mental Scale. Pintner and Cunningham (120) have con- 
structed a new test for kindergarten and first grade children, and 
Cunningham (32) discusses the prognostic value of this test. Another 
new test for kindergarten children, the Rhode Island Test, has been 
published by Bird (13), and Pressey (128) has issued a revision of 
his primer scale. Chapman (23) has devised a group test which can 
be used without prepared blanks, the questions being read out to the 
children. Toops (167) has published paper and performance tests 
fer vocational guidance. Scott and Clothier (187) have described 
the mental alertness tests used by them in personnel work. Kit- 
son (87) has discussed the value of tests in his wider treatment of 
the problems of vocational adjustment. In England Thomson (155 
and 156) has constructed the Northumberland Mental Tests and 
fomlinson (166) the West Riding Tests of Mental Ability. 
Decroly (34) describes several group tests for French children, and 
Stern (143) for German children. 

Comparisons and evaluations of different group tests have been 
made by Wilson (176), Morgenthau (106), Freeman (43) and 
Avery (4). Comparative studies of group tests for the primary 
grades with attempts to find out which are the best have been made 

Heckman (65), Morrison et al. (107), Foran (42) and John- 
son (81). Pintner (125) has published correlations of the Pintner 
Non-Language test with other tests as well as validity and reliability 

efficients, and Haggerty (58) has done the same thing for his 
Delta 2, together with revised norms for this test. 

[ests With School Children. The greatest use of intelligence 
tests is naturally found in the schools and we can mention here onl 

few of the numerous articles in which intelligence tests have been 
so used. The value of tests in general with indications of their use 
by the classroom teacher is discussed in Dickson’s book (36), and 
Paulu (116) devotes considerable space in his book to this same 
topic. Irwin and Marks (78) show concretely how tests have been 
used in a school to fit the work to very different types of individuals. 
The problem of sectioning pupils according to ability has many 
spects and several of these are discussed by Woody (178 and 179), 
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Sullivan (144), Baker (7) and Brooks (16). The uses of intellj- 
gence tests in High School are discussed by Keener (85), and 
Clark (26). Two wide surveys of high school seniors have been 
made, one in Massachusetts, comprising over three thousand cases 
by Colvin and MacPhail (29); another in Iowa, comprising 1,550 
cases, by Ruch (1385). The Colvin study finds that 27 per cent of 
the seniors are good college risks, 25 per cent questionable and 52 per 
cent bad college risks. Allen (1) finds that 885 Pennsylvania high 
school seniors have the same median scores on the Brown Univer- 
sity Tests as the Massachusetts seniors. Thorndike (159) discusses 
the intelligence of high school freshmen from the point of view 

their ability to study Algebra. Symonds (145) and Guy (57) discuss 


the relation between intelligence and ability in Algebra, and 
Hughes (76) similarly with relation to Physics. Feingold (41) dis- 
cusses the relation between intelligence and persistency in high school, 


\ study of over two thousand preschool children is reported 


by Mitchell (103) and group tests given to large numbers of elemen- 


tary school children are reported by Woody (180) and the Report 
of the St. Louis Division of Tests and Measurements (132), in 


] 


which 8,998 cases tested by the Pintner-Cunningham Test show a 


median 1.0. of 100. Tests of vocational schocl pupils show very low 
I.O.s according to Plenzke (126). Gray and Marsden (54) report 


Stanford-Binet tests given to English school children. 


he Feebleminded. Porteus (127) presents several studies 
mental deviations in which various kinds of tests have been giver 
to the feebleminded, and Wallin (171) collects the results of seven 
years’ work in a school clinic. Doll (37) discusses the definition of 
feeblemindedness and the limits in terms of intelligence. Kuhl- 
mann (90) makes a plea for a state census of the feebleminded and 
shows by a survey of eight towns in Minnesota that the percentage 
of feebleminde ness ranges from 2.4 per cent to 8 per cent with an 
average percentage of 4.7. Vanuxem’s (169) survey of a colony of 
feebleminded women offers as its chief contribution the limits im 
terms of mental age for different kinds of institutional work, and 
Merrill (99) shows the range and median M.A. and I.Q. for many 
tasks done by defectives. Two papers describe so-called idiots 
savants, one with a phenomenal memory by Otis (115), and the 
other with exceptional musical talent by Minogue (102). 

The Superior. A great deal of work has been done with children 
of high 1.0. Terman’s (154) monumental work is of chief im- 
portance. It gives the best and most accurate picture of the mental 
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and physical traits of children with 1.Q.s above 130. It will for a 
jong time be the standard source book in this field. ‘The Twenty- 
third Year Book of the National Society for the Study of Educa- 
tion (113) brings together a number of valuable articles by many 
authors with reference to problems centering around the training of 
sifted children, and this same problem is treated by Stedman (141) 
in another book, in which many case histories and samples of work 
are given. More popular and general articles are contributed by 
Hollingworth (73) and Terman (152). Two detailed studies of 
special classes for very bright children have been made by Coy (31) 
and Race (129). Johnson (82) shows that very bright students, 
mostly high school students, are judged by their teachers to stand 
1 in desirable traits. Van Alstyne (168) finds that character and 


hig 
emotional difficulties may prevent gifted children from doing satis- 
factory school work. Chassell (24) studies three gifted children 
with very inferior motor ability and Bush (21) finds them inefficient 
in ability to organize. Moede et al. (104) describe the selection of 
gifted children for the special schools in Germany. 

College Students. Ot the numerous reports of intelligence tests 

colleges and universities only a few can be mentioned here. The 
most important book is that of Wood (177), who deals with the 
whole problem of measurement in higher education and in particular 
with the results of intelligence tests given at Columbia. MacPhail (94) 


also gives a survey of intelligence testing in colleges and then goes 


into detail as to the tests given at Brown University. <A shorter 
ummary of the status of mental testing in colleges and universities 
up to 1923 is given by Laird and Andrews (91). Reports of differ 
ent intelligence tests given at specific universities are contributed by 


18), Burwell and MacPhail (20), Perrin (117), Root (134), 
ymonds (147), Terman (150), Viteles (170), Johnson (79). 
Kuenzel and Toops (89) report the results of tests given Over a 

ear period. Erffmeyer (40) and Jordan (84) discuss the rela 


i 
; 


tion of intelligence ratings to failure and dropping out of college, and 


aird (92) attempts to find the factors which cause low correlation 
between grades and intelligence. Averill (3) and Whitney (174) 
give the results of intelligence testing in normal schools, and 
rs (183) compares the intelligence ratings of students preparing 
for teaching with other students in the same college. Burtt et al. (19) 
report an experiment in sectioning college students on the basis of 
intelligence tests, and Miner (101) discusses the intelligence ratings 


ot twelve candidates for the Rhodes Scholarship. 
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The Delinquent. In Burt’s (17) comprehensive work on the 
young delinquent intelligence testing is given a prominent place. He 
finds 7.6 per cent definitely defective as contrasted with 1.2 per cent 
among his nondelinquent control group. This book is the most 
important recent contribution to the problem of juvenile delinquency 
that has recently appeared. Delinquent girls have been tested by 
Mathews (96), and misdemeanants have been given the Pintner Non- 
Language Test by Hamill (61). Several thousand penitentiary 
inmates have been given the Army Alpha by Murchison (108, 109, 
110), who. finds these delinquents as intelligent as, if not more intel- 
ligent than, the general white draft in the army. Undesirable 
behavior and its relation to intelligence has been studied by Hag- 
gerty (60) and Blanchard and Paynter (15). 

Racial Comparisons. Most of the intelligence testing for this 
purpose has recently been reviewed in the BuLLETIN by Garth,* 
and only those contributions not mentioned by him will be dealt with 
here. Garth and Whatley (47) find a median I.Q. of 75 on the 
N.1L.T. for 1,272 negro children. Bere (11) makes a careful com- 
parative study of Italian, Bohemian and Hebrew children, and 
Seago and Koldin (188) compare Jewish and Italian children 
Pintner (121) compares American and foreign children on a non- 
verbal and a verbal test, and Colvin (30) considers the language 
handicap of the Italian child. Fukuda (44) gives the median I.0.’s 
for many different racial groups. Studies involving the races in 
Hawaii have been made by Symonds (148) and Murdoch (111). 
Garth (46) gives results for over a thousand full blood American 
Indian school children. 

Inheritance. Hildreth (72) makes a comprehensive study of the 
resemblance in intelligence and achievement with over one thousand 
pairs of siblings. Hart (62) and Pintner (122) also contribute 
similar data. Dexter (35) studies the resemblance in intelligence 
between cousins, and Morrees (105) the intelligence of the parents 
of feebleminded children. Cobb and Hollingworth (27) show that 
there is a marked regression in intelligence toward the mean of the 
general population for the siblings of children testing above 135 1.0. 
Haggerty and Nash (59) in this country and Duff and Thomson (38) 
in England compare the mental capacity of children with parental 
occupation, 

There are many other articles which describe work with intelli- 

*Gartu, T. R. A Review of Racial Psychology. Psycnor. Butt., 1925, 
22, 343-364. 
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gence tests, which we have been unable to include in the space allotted 
to this topic, covering as it does the contributions of several years. 
Bibliographies, including both intelligence and educational tests are 
given in the review of Educational Tests by Jones and McCall in 
this number of the BULLETIN. 
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EDUCATIONAL TESTS 
BY V. A. JONES anv W. A. McCALL 


Teachers College, Columbia University 


Introduction—Stages in Use of Educational Tests. Primarily 
this article is designed to deal with recent trends as seen in the camps 
of the specialists in test construction, but perhaps conditions in the 
camps may be seen in a clearer light if we glance* around outside 
first. Woody (34) has noted three stages in the development of the 
use (or should we say users?) of standardized tests. The first is the 
“ curiosity ” stage in which school people, hearing of the innovation 
in education, give tests mainly for the purpose of seeing what will 
happen. The second stage might be termed the survey stage in which 
the predominant idea is the determining of levels of achievement, of 


SL RR 


comparing school system with school system, school with school, or 
class with class. Comparative achievement is the keynote. The third 
stage is one in which standardized tests are given and their results 
utilized for the purpose of improving instruction at specific points. 
“In this stage,” says Woody, “ measurement comes to be recognized 
as a most fundamental part of teaching. . . . There is much vari- 
ation in the rate at which they (the users of tests) move from stage 
to stage, but unless they finally arrive at the stage at which emphasis 
is placed upon desirable improvement in instruction, they are apt to 
give up the use of tests and declare that measurement is another 
educational fad.” In this third stage, it should be noted, there may 
be school and school-system diagnosis, or class and pupil diagnosis, or 
both. Also it should be noted that testing for survey purposes is in 
itself not necessarily reprehensible, though if it goes no farther many 
possible benefits are lost. 

Trends in Thought Concerning Measurement and Goals in Educa- 
tion. As educators become more convinced that measurement of 
certain aspects of school products has found a permanent place in the 
educational scheme, more and more serious do they become con- 
cerning the relation of measurement to goals. The expressions from 





* Due to the fact that our task is to review briefly the developments in the 
field of educational measurement over the last five years, it is obvious that 
only a “ glance” can be taken. Throughout the article only one or two workers 
out of many who have made contributions on a given topic can be mentioned. 
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the specialists in test construction during the last five years do not 
lead one to believe that their general concept of this relation has 
changed since the beginning of the test movement. Their concept is, 
and has been from the beginning, that the function of measurement 
is not to set up ultimate goals, but to measure progress toward the 
goals that have been set up in accordance with the philosophy of edu 
cation. This is what McCall (25) meant when he said that “ the final 
answer to every educational question except one must be left to 
educational measurement.” Measurement is like the log of a ship: 
it registers the knots made in some direction, but it does not indicate 
whether the ship is heading toward the desired port or a sand bar. 
The function of the compass and that of the log should not be con- 
fused. This is the theory in the matter, but either misunderstanding 
of the theory is so widespread or application of it is so difficult that 
many writers have given serious attention to it, and constructive 
suggestions have been offered. The difficulty of applying the prin- 
ciple given above can be illustrated in the field of curriculum con 
struction. Many writers seem to feel that however the distinction 
may be drawn theoretically between measurement of results obtained 
and the goals set up by the curriculum, the results which are measur« 
will react upon the curriculum as applied by the average teacher, if 
not upon the curriculum as it exists on paper. Wilson and Hoke (42) 
say that we are approaching in measurement a stage in which “ the 
tests will be weighed and judged as to the fundamental considerations 
of curricula making involved, whether they are or are not testing 
lesirable school products, and whether their use will or will not lead 
to better methods of teaching and better selection of subject-matter.” 
Upton (39) pleads that the specialists in test construction choose 
for their tests only those elements which measure facts and skills 
which the children will need to know in life situations, and that they 
present these facts and skills in a manner consistent with scientifically 
established methods of teaching. Woody (48) notes a growing con 
viction on the part of many that “the best way to guarantee the 
teaching of a particular aspect of a subject is to construct a test which 


purports to measure that aspect of the subject.” There are two gen 


* Note that emphasis here is placed on ultimate goals. It should be said 
that many writers feel that measurement can function in determining inter- 
mediate goals in an indirect way. Some think it may aid in the location of 
specific objectives in particular grade or age levels; some think it may assist 
in evaluating the more immediate objectives in the scheme defined by the 
ultimate goals. 
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eral suggestions, therefore, as to how specialists in constructing tests 
may aid, or at least not interfere with, the development of curricula 
in accordance with current philosophy, namely: (1) Examine more 
carefully the elements that go into the educational test; and (2) Test 
more aspects of education. 

Tests and Methods of Teaching. Some fear is expressed now and 


by specialists in methods that educational tests may tend to 


then 
undermine some of the commonly accepted procedures. However, 
little of a  ‘onstructive nature has been offered besides the ideas 
exemplined above in the writings of Wilson and Hoke, and Upton. 
It is interesting to note that one of the latest trends in test develop- 
ment, viz., that toward diagnosis of difficulties of individual children, 
has met with more or less general approval from administrators and 
teachers. The most severe criticisms of the use of present tests for 
diagnostic purposes come from the specialists themselves, as will be 
shown later. Likewise prognostic tests and practice tests will have 
less difficulty probably in meeting the criticisms of administrators and 
specialists in methods than in meeting those of the educational 
psychologist. 

Extension of Measurement versus Intenswe Study of Present 
Instruments. Having given some attention to trends which might be 
noticed more clearly by not coming into too close a contact with the 
specialists, let us now look at the developments from a closer range. 
One of the issues which assumes rather large proportions is the ques- 
tion of the extension of measurement to wider fields versus more 
exact measurement in the fields already tried. Though it will be 
impossible to divide the specialists into two opposing camps on few, 
if any, of the issues that will be presented in this article, it will be 
possible to show that one writer tends to emphasize more strongly 
one aspect of a question, while a second writer will give special atten- 
tion to another. Thorndike (34) pleads for refinements, for more 
valid and more reliable measures in old fields. He says: “From the 
research field we should seek better tests with more numerous forms 
than are now available. Improvement in the instruments of measure- 
ment, both of intellect and of school achievement, is more desirable 
than multiplication of their number.” Gates (14) shows conclusively 
that some of the best of our educational tests are insufficiently 
refined for exact individual examination in short periods of time. 
He would not only ask that an author of a test submit his proposed 
instrument to rigid tests for validity and reliability, but he would also 
insist that the author make a study of the function or functions which 








Sts 
tla 
re 


est 


nd 





wn 


EDUCATIONAL TESTS 38 


he was attempting to measure. Judd (22) and Gray (18) would 
agree most heartily with this demand for attention to the psychology 
of learning of the subject to be tested. 

The question is viewed from a different angle by Woody (34) 
and Hudelson (21). The former predicts the development of addi- 
tional forms of existing tests, the utilization of many informal tests 
patterned after the standardized tests, and the production of tests 
for the measurement of “ higher types of educational activity.” He 
utters the warning that if our trained workers do not produce tests 
fast enough to keep up with the demand, the untrained will make an 
attempt to augment the output. Hudelson confines his remarks to 
composition scales. He thinks that a local composition scale stimu 
lates pupils and teachers sufficiently for “ each school or at least eact 
school system to feel justified in devising one.” He would insist that 
the local scale be based on a standardized scale of established merit. 

Chapman (6) in 1921 expressed the idea that through extension 
of measurement into new fields greater accuracy on the whole 
would result. Stating that the balancing of errors within the experi- 


ment was one of the elementary principles of scientific method, he 


said: “In one part of the experiment we must not attempt to get 
accuracy to one-tenth of a per cent, when in another part of our 
experiment there is an obvious error of 3 or 4 per cent.” In this 
same article we find another statement which should be quoted in 


order that Chapman’s attitude toward other trends, to be discussed 


later, may not be misunderstood. The statement is: “Obviously the 


; 


school situation demands tests and yet more tests, tests that are suf 


+ 


ciently accurate, not to reveal the acumen of the men who construct 
them, but for the practical uses to which they will be put.” Regard- 
less of one’s idea concerning the questions of more measurement and 
better measurement, he will agree with Chapman that the accuracy 
desired for a given test will depend upon the use to be made of the 
results. Probably it would be possible to harmonize around this idea 
the different thoughts expressed above, but for our purposes this 
would seem to be unwise. It is illuminating to know that if 10 
persons look through a prism from enough angles all will see prac- 
tically the same colors of the spectrum, but scientific discoveries in 
this field have come from those who have insisted on examining the 
spectrum from the particular angles which they chose. 

In practice, both intensive study of measuring instruments in old 
fields and extension of measurement to untried fields are taking place. 
In the following sections the reader will be given proof of the fact 
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that attention is being given to refinements and accuracy in measure- 
ment. That extension of tests and scales into new fields has taken 
place recently is attested by the fact that within the last five years 
instruments have been made available which purport to measure the 

following: ability to judge poetry, ability to judge orchestral music, 

ability to pick out the central thought of a paragraph, ability to weigh 

foreseen consequences, Biblical information, health knowledge, 

achievement in sewing, knowledge of mechanics in language,” etc. 

Combining of Results of Mental and Educational Tests. In 1921 

Pintner and Marshall (31) reported the combined results from a 

mental examination and an educational test. The difference obtained 

between the mental and educational indices was used as a device for 

determining whether a given child or class was achieving as much 

educationally as the average child or class of like capacity. Pintner 

had suggested in 1918 the possibility of developing such a measure, 
but it was not until the period which we are reviewing that this 
development actually took place. The F score developed by McCall 
should be mentioned under the difference method. About the same 
time that the difference method of combining results from mental 
and educational tests was being worked out, Franzen (13) was 
developing the accomplishment quotient technique. This quotient is 
. {2.e., educational age ) 





obtained by dividing the educational quotien 
: | chronological age | 


by the intelligence quotient. The general scheme of comparing edu- 
cational achievement with native capacity to achieve is one of the 
most important devices for practical educational purposes that has 
been developed in the field of tests and measurement. It enables a 
supervisor to discover groups which are accomplishing more or less 
than the average of similar mental endowment. It enables a teacher 
to discover, with a fair degree of precision, whether individual pupils 
are achieving as much in measurable school subjects as one would 
expect in view of their capacities. 





Paralleling this tendency to combine educational and mental rat- 
ings of pupils has gone the development of general survey or omnibus 
tests. Not only does one find a tendency to group several educational 
tests into one battery, but also a tendency to combine in one booklet 

, an intelligence test and two or more educational tests. 
Teachers and administrators who have become acquainted with 





* The authors and dates of publication of these tests, taken in order, are: 
Abbot and Trabue, 1921; Trabue, 1923; Woody, 1923; Chassell, 1924; Laycock, 
1925; Gates and Strang, 1925; Murdock, 1922; Wilson, 1922. 
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’ omnibus tests and the methods of combining the results of mental and 





educational examinations have, in general, favorably received both, 
. but especially the latter. ‘They see in the omnibus test the possibility 
of converting several measures of a child into a single figure without 


having to face the baffling task of combining incomparable scores. 


ee : = , + 

: They receive rather enthusiastically the idea of combining achieve ; 
ment and intelligence measures, because they see in it a helpful k 

¥, 


method for diagnosis. Among the specialists, however, one finds ‘ 

interesting differences of emphasis, especially in connection with the 

employment of accomplishment quotient and similar techniques. In 

examining the following comments, the reader is reminded of the 

fact that the accomplishment and difference techniques may be applied 

to a comparison of intelligence measures with ratings in one school 

subject or with measures for several school subjects combined. 

Thurstone (35) predicts that measurement scales in which school 
hievement and alertness are both measured will be used extensively 
the future. Franzen (13) recommends the use of the accomplish 

ment quotient for detecting strengths and weaknesses in school 

systems, in individual schools, and within individual classes. He also 

suggests its use as a school mark. Stebbins and Pechstein (33) stress 


it as being the best measure of the efficiency of a teacher. 
ntner and Marshall (31), in discussing the implications of the 
ducational-index difference, are cautious in their statement 
is to the use of this difference as a norm or standard. They show 
that about 50 per cent of school children obtain index differences 
between +8 and —8. But they point out that though this was the 
mdition found with the mass of children, they are not assuming 
that this group was actually working up to its mental capacity. 
Henmon (19), in the light of his findings, is skeptical of the accuracy 
omplishment quotients in history, algebra, and reading. He is 
doubtful if we can safely claim to measure educational products with 
sufficient accuracy to go much beyond the comparison of schools and 
classes. Toops and Symonds (37) contend that accomplishment 
quotients which are based on intelligence quotients derived from 


different mental tests, and educational quotients from different 





educational tests, cannot be compared. Chapman (8) criticizes the 
use of accomplishment quotients for individual pupils, unless many 
measures of intelligence and school success are made. He con- 
tends that to some extent intelligence and achievement tests 
measure the same thing; that to some extent they are both 
unreliable. He is amazed that accomplishment quotients are fre- ; 
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quently deduced from but one intelligence and one educational test. 
From a theoretical group in a single grade he finds that “ the measure 
of intelligence and that of school success would have to be made six 
times to give a sound basis for mental-educational-achievement 
results.” Kelley (23) agrees in general with Chapman’s views, but 
from a calculation of the reliability of quotients he concludes that 
Chapman is too critical of the reliabilities of accomplishment 
quotients and like measures. Ruch (32) points out that the accom- 
plishment quotient technique assumes that educational abilities corre- 
late perfectly with mental ability, at least when pupils are motivated 
to the maximum. He mentions the fact that Franzen obtained an r 
of only between +.70 and +.85 from a sampling of pupils who had 
been highly motivated, and says that with an average group we should 
expect the r to be lower. Moreover, he thinks that accomplishment 
quotients will be especially fallible unless long-time testing is done. 
Gates (15) utters a warning against allowing total educational scores 
to blind one to important differences in achievement in the various 
subjects. He finds that the correlation between scores on tests in 
different school subjects is not high, and concludes that the com- 
paratively low correlations were probably due in part to differences 
in original or acquired aptitudes. He emphasizes the point that to 
the extent that specialization in subject-matter exists to that extent 
would one question the validity of the accomplishment quotient and 
similar practices based on the assumption of slight specialization. 

Provided sufficiently accurate measures could be obtained of intel- 
ligence and of achievement in the separate school subjects, all of the 
adverse criticisms would vanish with the possible exception of Ruch’s. 
Gates’ criticism would still be launched against general accomplish- 
ment quotients, but not against subject quotients. In view of the 
great demand for such a measure as the accomplishment quotient, 
it seems safe to predict an increasing amount of research directed 
toward the improvement in the validity and reliability of tests in 
the future. 

Use of Tests for Diagnosis. During the last five years much 
emphasis has been placed on the diagnostic values of standardized 
tests, and some research has been done in connection with the con- 





struction of tests designed especially for diagnostic purposes. 
Fernberger (12) thinks that so much emphasis and attention has 
been given to statistical treatment of final test scores that many 
facts of diagnostic value in test results have been overlooked. 
Monroe (28) assumes that all tests worth giving have some diagnostic 
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value. He says: “ The fundamental thing for the teacher to bear 
in mind is the principle that only when we interpret the scores of a 
test in terms of pupil needs and modify instruction to meet these 
needs will educational tests fulfil their function.” Gates (17) feels 
that standard tests will be used more and more for diagnosis of the 
weaknesses of individual pupils; more and more in testing the 
efficiency of methods of teaching. He thinks that special attention 
should be given to the psychology of learning of the subject by one 
who plans to construct a diagnostic test. For an illustration of his 
researches along this line, the reader is referred to his work in the 
construction of a test in pronunciation. Gray’s admirable work (18) 
in diagnosis and treatment of cases in reading has many suggestions 
for anyone who wishes to construct tests of diagnostic value in that 
subject. Buswell (5) thinks that “the most pronounced limitation 
of standardized educational tests is that they deal with results rather 
than with processes back of the results; with complexes rather than 
elements. The tests give a comprehensive score, but do not analyze 
the components of the mental processes involved.” The Consultative 
Committee on Tests (9), which met in London in 1924, and which 
received reports from various countries on the use, development, and 
interpretation of tests, makes two interesting comments on the topic 
under consideration. This committee thinks that when a test is used 
for diagnostic purposes it is being assumed, in the first place, that 
the child has passed his life in a certain environment; and, in the 
second place, that the child has made as much use as his intelligence 
has permitted of the stimuli afforded by such an environment. In 
the latter statement the committee is doubtless referring to tem- 
perament in general. 

Prognostic Tests. In prognostic tests, as in diagnostic tests, 
emphasis is placed on measures for individuals, consequently the 
problem of accuracy of measurement is of paramount importance. 
Since measures which are reliable for individuals are obtained with 
so much difficulty, practically all the tests submitted for prognostic 
purposes are prefaced by a warning by the authors against over-con- 
fidence in the results obtained. The general feeling among writers 
on the subject is that educational tests of this type are in the early 
stages of experimentation. The work of Seashore in the construction 
of tests of musical ability and the research of Briggs and Kelley (2) 
in measuring ability to learn foreign languages, are sufficient to show 
that this field is receiving serious attention. The fact that the possi- 
bilities of applying tests of this type in business has seemed so bright 
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to some, makes the more critical of the psychologists a little fearful 
lest the results of these tests be used with too little discretion, 
Practice Tests. It has been mentioned repeatedly that one of the 
main trends in educational measurement has been in the direction of 
perfecting instruments which would assist class-room teachers in 
coping more effectively with their complex problems. The develop- 
ment of practice tests illustrates definitely this tendency. The practice 
tests in handwriting devised by Courtis (11) and the practice lessons 
in reading by McCall and Crabbs (26) are examples of this develop- 
ment. The contention of Judd, Freeman, Gray, and Gates that 
specialists in test construction should give much attention to the laws 
of learning in the subjects tested deserves great emphasis here. It 
is interesting—but not encouraging—to note that practice tests are 
being placed on the market more rapidly than convincing discussions 
of the specific functions of practice tests in the psychology of learn- 
ing of the particular subjects are being produced. To be specific, 
how can one defend scientifically a practice test in geography ? 
Continuous Revision of Norms, and Improvement in Validity and 
2eliability. Burt (4), in commenting on standardized educational 


“< 


tests in England and America, utters a warning against “ the danger 
that lurks in giving prominence to the average score.” He continues 
thus: “ Just as an official minimum wage tends to become the maxi- 
mum, or at least to limit it, so a risk arises lest better performance 
should tend to be depressed toward the mean.” Thorndike (34) 
points out that norms are in need of almost constant revision due 
to changing school conditions and teaching techniques. 

There is a feeling among many that improvement in validity may 
result if the test elements are chosen, as far as possible, on the basis 
of the findings of thorough studies of present usage; such, for 
example, as Wilson’s study of social and business usage of arithmetic, 
or Thorndike’s word count in reading. Thorndike would seem to 
favor any trend in this direction. Courtis is apparently not fully 
convinced on the point, at least as it applies to arithmetic. Another 
tendency which some think may work toward the improvement in 
validity is the analysis of the test situation for factors other than the 
one or ones selected for measurement. For example, a reasoning 
problem in arithmetic, as was pointed out by Courtis in 1913, may 
not measure a child’s ability to solve problems, but his reading ability. 
Moreover, it has been generally conceded that a distinction between 
rate and quality would tend to improve the validity of tests in many 
subjects, provided some satisfactory method could be devised for 
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combining the two in cases where this was desired. In difficulty tests 
the factor of speed is left relatively uncontrolled in most of the 
present measurements. If two children receive the same seore on 
a difficulty test, say arithmetic reasoning test, one cannot conclude 
that the two children have equal ability in the subject measured, 
unless he assumes (in addition to perfect reliability) either that both 
children work with equal speed, or that a level is suddenly reached 
where time does not contribute to success. Some workers have seen 
the problem but, not knowing how to combine rate and quality, have 
reported the two separately. This has not proved to be satisfactory. 
Thorndike proposed a technique several years ago which gave speci- 
fied weights to speed and accuracy at each step of the series, but it 
involved such intricate computation that it has been employed seldom. 
Courtis (10, 11) has given a great deal attention to this problem. 
Among his contributions on this topic is a formula for combining 
rate and quality in the case of handwriting. Gates (16), by empirical 
methods, has also devised a simple formula for combining rate and 
quality in handwriting. 

Increase in the reliability of tests must come chiefly through 
statistical study. The only suggestions that have been offered re- 
cently, other than statistical techniques, are: (1) The increase in 
the length of the test and of the testing time; and (2) the multiply- 
ing of the number of alternate forms. In the field of statistical 
procedures, special mention should be made of two methods which 
have been proposed recently: First, the overlapping method devised 
by Vincent (41) for dealing with items which are dichotomously 

red. Second, the method developed by McCall (27) which can 
be applied to items however scored. Both of these techniques can be 
used to measure reliability or validity of test items. Studies the aim 

f which is to select the most valid and reliable of existing tests for a 
given subject may be illustrated by the work of Breed and Harris (1) 
on algebra tests; Gates’ (14) research on reading tests; and Hudel- 
son's (20) work with English composition. 

Trend Toward Uniform Method of Scaling Tests. From the 
beginning of the development in educational tests, there has been 
a demand for a uniform method of scaling tests and of reporting the 
final score. This demand has become more and more insistent in 
the last few years, due, probably, in no small measure to: (1) The 
rapid multiplication of tests; and (2) the development of school 
bureaus of research whose workers are faced with the problem of 
interpreting the results of tests to school people who are not specialists 
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in test construction. In order to secure a common unit and a common 
zero point on different tests, a number of plans have been proposed 
for transmuting point scores into derived or scale scores. Bucking- 
ham (3) has suggested that point scores yielded by a test be always 
expressed in grade units. But realizing the desirability of having a 
method whereby educational and intelligence measures could be 
brought together, Monroe (28, 29), among others, has suggested 
the age unit for certain purposes. Pintner (31) has devised the 
method of reporting the score of a child in terms of the standard 
deviation of his own age group. McCall (24) has proposed that all 
scores be referred to a standard group, namely a random sampling 
of twelve-year-old children, and that all measures, with statistical 
correction for age where necessary, be reported in terms of T (an 
adaptation of the standard deviation). Van Wagenen (40) has de- 
vised a simplified method of using a technique employed by Kelley 
in 1916. He has named this system the C Score Scale. Thur- 
stone (36) has proposed a method of scaling tests which is similar 
in some respects to Pintner’s method. It assumes that the distribu- 
tions of ability in the several age or grade groups are normal, but it 
allows freedom of variation for the means and for the standard 
deviations of the several age groups. He illustrates the method with 
intelligence test data, but the method is suggestive for workers in 
educational tests. 

With so many different proposals being made, one might well 
ask for proof that there is a trend toward uniformity in the method 
of scaling tests. The proof is in the fact that in every article men- 
tioned in this article the author explicitly states or implies that he 
sees the need for a uniform method, at least where the type of 
response to be scaled is similar—and only a few writers add the 


qualifying clause. 
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ept for a few important articles, this bibliography covers the 


period from 1920 to 1925 inclusive. The productivity of these years 
een so great that it has been necessary to omit all reference to 


a 


ai 


scales and techniques even where their use has been so guarded 

1s to approach objectivity of measurement. There have been omitted 
Iso all vocational tests, which are summarized by Manson, all but a 
few physiological tests, included because of their more direct bearing 
problems of character, and all tests of more or less relevant 


ywwledge, such as biblical information tests. 


Summaries. Allport, G. W. (4), Allport, F. H. (1), Holling 
worth (86), Manson (113), May (122), May and Hartshorne (125) 
Symonds (167), Watson, G. B. (184), Wells (190). 


mM 


atteries. Made up of certain of the tests listed below and 
ng extensive measures of various aspects of character and 
lity. Cady (24), Five tests of juvenile incorrigibilit. 
Downey (44, 45), Twelve tests of will-temperament. Hartshorne 
Mav (82), Two batteries, two forms each, five tests each of 
moral knowledge and discrimination. Kohs (91), Six tests of ethical 
discrimination. Lentz (107), Seven tests of delinquent tendencies. 
Marston, L. R. (115), Five tests of introversion and extroversion. 
Pressey (145), Four tests of emotionality. Raubenheimer (150), 
Twelve tests of potential delinquency. Union School of Re- 
ligion (175), Two batteries, two forms each, one elementary, one 
secondary, one battery for religious ideas, one for ethical judgment. 
Voelker (178), Ten tests of trustworthiness. Watson, G. B. (183), 
C 


Six tests of fairmindedness in the field of economics and religion. 
C. Tests and Techniques Intended Primarily to Measure Objec- 
tively (and Mainly in Terms of Conduct) Certain Personality Traits 
and Types of Behavior. 
1. Aggressiveness. Moore and Gilliland (185). Measured by the 
‘This bibliography has been prepared in connection with an Inquiry in 
Character Education made possible by a grant to Teachers College from the 
Institute of Social and Religious Research. 
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difference in quality and quantity of certain types of mental work 
done under normal or standard conditions and the same or similar 
work done under conditions involving certain social and mechanical 
distractions. 

2. Ascendence-Submission. Allport and Allport (2). Measured 
by the strength of reaction to an imaginary situation containing a 
mild rebuke. 

3. Caution. Brown (16, 17), Manson (114). Measured by the 
relations between the items omitted, attempted, right and wrong in 
an intelligence test especially by the ratio of the items attempted to 
the items wrong. Marston, L. R. (115) Measured by reactions to 
situations requiring that subject “take a chance.” 

4. Compliance. Marston, L. R. (115). Measured by length of 
time a child will try to open a box containing a cherished object. 

5. Confidence. Trow (173). Experimental technique. Subjects 
judged length of lines, ethical situations, etc., and then rated the 
degree of confidence in the accuracy of their judgments. 

6. Conformity. Deutsch (43). Measured by the number of con- 
ventional preferences in a multiple choice paper and pencil test. 

7. Conscientiousness. May (123). Test of the conscientiousness 
of conscientious objectors to military service by a social and religious 
background questionnaire. 

8. Decision Speed. Bridges (10), Downey (44, 45, 51), Filter 
(59), Gibson (70), Trow (174). Various kinds of test material 
used with this central point: Speed of decision is measured by the 
time it takes the subject to decide in such things as checking traits, 
comparing weights, lengths of lines, judging the letter of greatest 
frequency on a card, etc. 

9. Expansion-reclusion. Allport and Allport (2). Measured by 
the type of reply made to a standardized want advertisement. 

10. Deception. 

(a) Detection (not measurement) by inspiration-expiration ratio 
before and after truth-telling or lying: Burtt (23), Marston (116, 
117). 

(b) Detection by systolic blood pressure: Burtt (23), Marston 
(118), Larson (105), Langfeld (103), Landis (98, 100). 

(c) Detection by associative reaction times: Marston (117, 116), 
Goldstein (72). 

11. Honesty. Many tests and techniques used. 

(a) The overstatement test: Voelker (178), Cady (24), Rauben- 
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heimer (150), Terman, et al. (168). Measured by difference between 
claimed and actual abilities. 

(b) The paraffin paper test: Voelker (178), Cady (24). Dupli- 
cate test performance made on paraffin sheet removed before subject 
corrects his own paper. 

(c) The peeping technique: Voelker (178), Cady (24), Mur- 
doch (137), Terman, et al. (168). Peeping shown by degree of 
success in placing pencil point in scattered circles or in threading 
mazes when eyes are supposed to be shut. 

(d) Books read technique: Franzen (62), Raubenheimer (150), 
Terman, et al. (168). Measured by number of fictitious titles pupils 
check as having read. 

e) Identical errors technique: Gunlach (74). Measured by num- 
ber of identical errors students sitting near each other will make on 
a short quiz. 

(f) Duplicating papers technique: May and Hartshorne (124). 
Papers duplicated by examiner before being corrected by pupils. 
Changes shown by comparison with duplicates. 

(g) The overchange test: Voelker (178). Will the subject keep 
overchange ? 

(h) The missent letter technique: Voelker (178). Will the sub- 
ject keep money mailed him “ by mistake.” 

(i) Technique to check students faking laboratory experiments : 
Laird (96). 

12. Incorrigibility. Cady (24). Uses a battery testing honesty, 
emotionality and moral judgment. 

13. Originality. Chassell (32). Twelve tests in which originality 
is supposed to be displayed. 

14. Perseveration. Bernstein (6), Lankes (104). A laboratory) 
technique for measuring this phenomenon and showing its relation 
to persistency of will. 


by the length of 


15. Perseverance. Fernald (56). Measured 
time a subject can stand on his toes. 

16. Persistence. Chapman (30). Measured by length of time 
pupils will stick to a disagreeable task. 

17. Self-assertion. Marston, L. R. (115). Measured by the 
length of time a child will play with a non-preferred toy without 
demanding the preferred one which he knows is available. 

18. Self-assurance. Filter (59). Measured by the differential 
between what the subject thinks he can do and what he actually 
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does using (a) a string figures test and (b) a modification of the 
Binet paper folding test. 
19. Self-estimation and Evaluation. Measured by: 
(a) The differential between self-estimates and scores on an 
objective test: Allport and Allport (2). 
(b) The differential between ratings of self as is, as should be. 
and as others are on identical traits: Knight and Franzen (90), Chas- 
sell and Watson (36). 
20. Social Perception. Measured by ability to judge the emotion, 
feeling, or mental state exhibited in facial expressions and recorded 
in photographs. Standardized photographs are used: Allport (1), 
Landis (100), Langfeld (101, 102), Gates, G. S. (168), Ruck- 
i mick (157). 
21. Social Resistance. Marston, L. R. (115). Measured by the 
degree of resistance in social barriers which a child will overcome 
in getting acquainted with a stranger. 
22. Studiousness. Measured by 


a) The differential between scores in an intelligence test and 
scores On an unannounced quiz in classroom work: Symonds (165), 

(b) The number of hours per week spent in study: May (121). 

23. Suggestibility. Measured by 

a) A wide variety of suggestion tests: Whipple (191), Brown 
{ 15 ). Met reogh { 128). Town ( 170). 

(b) A specially prepared paper and pencil test of twenty ele- 
ments : Otis (140). 

24. Trustworthiness. Voelker (178). Measured by a variety 
of performance tests. 

D. Tests and Testing Techniques Intended to Measure Primaril) 
the Affective Aspects of Personality. 

I. Instincts and Emotions. 

a. Laboratory Techniques for Measuring the Relative Strength of 
Emotions, and Instincts. 

1. By association times to words intended to tap various emo- 
tional types of complexes: Moore (133). 

2. By the differential between routine mental work done under 
normal or standard conditions and the same or identical work done 






under emotional distractions of various sorts: Moore (134). 









3. By judgment of the varying degrees of persuasiveness 0! 
Hollingworth’s standard advertisements used in his table of Per- 
suasiveness of Advertisements: Folsom (60). 
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4. By moving pictures of facial expressions and head movements 
made in response to a wide variety of emotional stimuli: Landis (98, 
100). 

5. By measuring systolic blood pressure and inspiration-expiration 
ratio when the subject is exposed to emotional stimuli: Landis (98). 
And by these and the psychogalvanic reflex as well: Blatz (7). 

6. By recording bodily movements and expressions and verbal 
expressions under above conditions: Landis (98). 

b. Laboratory Techniques for Measuring General Emotionai 
Instability. 

1. Measured by the number of times the subject will withdraw 
his hand from an apparently very dangerous situation which he has 
been assured by the experimenter is perfectly harmless: Crane (42) 

2. Measured by electrical changes in the body. The psycho- 
galvanic reflexes. This work has been summarized by Wechsler 
(187), and others. 

3. Extrovertive and introvertive tendencies have been measured by 

(a) a series of objective tests all described under C above by 
Allport and Allport (2) and 

(b) a series of five conduct tests described under C above by 
Marston, L. R. (115). 

c. Paper and Pencil Tests of Emotionality and Emotional Insta- 
lity. 

1. The Pressey XO tests are well known and need not be de- 
scribed here. They are described by Pressey in references (145, 
146, 147. 148). Chambers (27, 28), Collins (38), Olsen (139), 
Sunne (164) have obtained good results with them. These investi- 
gators have found that the Pressey XO tests will differentiate certain 
psychopathic types of personality and degrees of emotional maturity 
nd predict academic success as well if not better than intelligence 
tests. 

2. Emotional questionnaires. 

1) The Woodworth Personal Data Sheet (194) was compiled 
by Woodworth for use in the army. It consists of 120 questions 
intended to reveal psychopathic tendencies. Results have been pub- 
lished by Hollingworth (87), Everett (52) and Matthews (119). 

(b) The Woodworth-Matthews Personal Data Sheet (195) is 
an abridgement of the original Woodworth to seventy-five questions 
intended to test emotional instability of children. 

(c) Cady (24) used Johnson’s (89) modification of the Wood- 
worth Questionnaire, repeating it with all questions reversed in form. 
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(d) Chassell and G. B. Watson have extended the Woodworth 
Questionnaire in their “ Emotional History Record ” (36). 

(e) Laird (95) has published a further revision of the Wood- 
worth. The form of the questions has been changed from “ Yes- 
No” to the graphic rating scale type. Many questions regarding 
extroversion-introversion have been added. Hoitsma (85) reports 
statistics on its reliability. 

(f) Other questionnaire studies in the field of emotions, instincts 
and other dynamic traits have been made by Freyd (67), Wash- 
burn (180), and Wells (189). The questionnaires used in these and 
other studies have not been published for general use. 

(zg) An excellent statistical study of these measures of emotion- 
ality has been made by Landis, et al. (100). 


Il. Mood and Temperament. 

a. Laboratory Techniques. 

1. Cheerfulness and Depression, Optimism and. Pessimism have 
been measured in the Vassar laboratory by Miss Washburn and her 
students (181, 182, 136), by the use of the galvanic reflexes and 
the free association method. 

2. The effects of Encouragement and Discouragement on mental 
work have been tested by G. S. Gates and Rissland (69). And the 
effects of Praise and Reproval on increase in mental output have 
been measured by Hurlock (88). 

b. Paper and Pencil Tests. The Downey Wéill-Temperament 
tests are the most outstanding of all efforts to measure the dynamic 
traits. The literature on these tests has been summarized briefly by 
Freeman (63) and Kuntz (94) and critically by May (122), in 
1925. Nothing has appeared since these reviews to change the 
general conclusions there reached. The accompanying bibliography 
lists all articles on the Downey tests to January, 1926. 


Ill. Attitudes, Interests, Preferences, Prejudices, etc. 

a. General Social and Religious Attitudes and Interests. 

1. Case (26). A true-false test of religious and social opinions 
and attitudes. Applied to college students: Hartshorne (81). 

2. Hart’s Test of Social Attitudes and Interests (77, 79), and 
Personnel Assayer (78). By a special technique of recording likes 
and dislikes, favorable and opposed reactions, beliefs and disbeliefs, 
in respect to specific situations, he attempted a measure of a wide 
range of attitudes. 

3. Lund (109). A set of questions requiring scaled responses as 
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to belief in, certainty of, and desirability of the propositions involved 
in the questions. 

4. Raubenheimer (150). Used four attitudes tests: character 
preferences, reading preferences, activity preferences, and an associa- 
tion test, all aimed at measuring the child’s general attitudes on a 
variety of questions. 

5. Shuttleworth’s University of lowa Assayer (161). Based on 
the Hart technique. 

6. Travis (171). Made up an attitudes test based on selected 
traits aimed at measuring likes and dislikes of various traits. 

7. Watson’s Test of Public Opinion (183, 185). Consists of six 
tests designed to measure fairmindedness by eliciting the expression 
of prejudices on various religious, economic and political issues. 

b. Tests of More Specific Attitudes. 

1. Fair-mindedness. Measured by Watson’s tests above (185). 

2. International-mindedness. Measured by Manry (112) with a 
general information test. 

3. Money-mindedness. Measured by Shuttleworth using Hart’s 
technique (160). 

4. Open-mindedness. Measured by a questionnaire. Symonds 
(166). 

5. Public Spiritedness. Tested by Coe (37) with two forms of 
a free association test. 

6. Liberal or Progressive attitudes in politics have been measured 
with a questionnaire scale by Allport and Hartman (3). 

7. On Prohibition. Measured by a multiple choice test (39). 

8. On Race Relations. Measured by a multiple choice test (39). 

9. Social Distance, or attitudes toward various social groups. 
Measured by Bogardus (8) with his social distance test. 

10. Sociality or ability to “ mix ” well. Measured by Ream (151, 
152) with his social relations test. 

c. Tests of Specific Interests. 

1. Measured by the number of irrelevant words crossed out in 
reading interesting and dry passages: Burtt (22). 

2. Measured by score in a “paired associates’? memory test: 
Burtt (22). 

3. Measured by information tests : 

(a) Burtt (22). Based on subject’s supposed interests. 

(b) Pressey (149). Interest in sports measured by information. 
(c) Lentz (107). Recognition of pictures. 

4. Measured by free association. Wyman (196) tested the in- 
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¢ tellectual, social and activity interests of Terman’s gifted children 

with a specially prepared free association test. 

5. Measured by recording behavior of children in a museum, 
Marston, L. R. (115). 

6. Measured by having school boys hand in each day an envelope 
containing the most interesting thing they could find or a description 
of it. Lentz (107). 

7. Interest analysis. Both Freyd (64, 65, 66) and Miner (131) 
have published interest analysis blanks designed to detect vocational 
interests. 


E. Tests and Techniques Intended to Measure Primarily Social- 
Ethical Ideas and Judgment. 

I. Tests Requiring the Ranking or Rating of Situations. 

1. Brogan (11). Involves the ranking of sixteen offenses based 
on the worst practices of university men. 


2. Brotemarkle (13, 14). Seven lists of words, seven words in a 
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list. In each list the words are ranked according to shades of mean- 
ing, the two extreme words being given. 
3. Cady (24). Moral Judgment Test, consisting of rating acts 
on a four-place scale according to degrees of blameworthiness. 
4. Chapman (29). Measured pupils’ ideas of relative values by 
having them rank a list of reasons for (1) going to high school, 
(2) reading good literature, (3) saving money. 
5. Chassell and Chassell (33, 35). Tests of religious ideas con- 
sisting in ranking in their order a set of answers to seven questions. 
6. Fernald (54). Ethical Discrimination Tests. Requires the 
ranking of ten offenses, ten meritorious acts, and ten ambitions. 
7. Hartshorne (8). Ethical Discrimination Tests. Requires the 
ranking of children’s acts. 
8. Kohs (91). Ethical Discrimination Tests. Test Five is an 
offense rating test. 
t 9. May (120). Scale for measuring moral and religious values, 
. nvolving the ranking of twenty-five situations in three different ways. 
10. Raubenheimer (150). Has an offense rating test in his 
battery. 
Il. Tests Requiring Various Sorts of Responses to Imagined 
Situations. 
1. The Moral Dilemma type of Test. 
1. Fernald’s Ethical Perception Tests (55). 
2. Hartshorne and May, Moral Knowledge Tests (82). Two 
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tests of this type, Comprehensions, of the Binet Comprehension type, 
and Provocations, calling for a judgment of rightness or wrongness 
on a specific act. 

3. Kohs’ Ethical Discrimination Tests (91). Test I is of the 
3inet Comprehension type. 

4. McGrath (129) used four tests of this general type. 

b. Tests of Foresight of Consequences. 

1. Chassell and Chassell (34). The subject is required to indi- 
cate the desirable and undesirable consequences of various solutions 
to a moral problem contained in a story. 

2. Hartshorne and May (82). Used two such tests, a story 
completion test, and a word consequence test in which the subject 
underlines the word that is the best and the one that is the worst 
consequence of the stimulus word. 

3. Myerson (138) used the completion type of test in which 
various phrases are filled in to complete the story. 

c. Tests ot Ability to Recognize or Identify the Moral Element 
in a Situation. 

1. Chassell (35). Parables interpretation test, multiple choice 
type, requiring the pupil to find the true meaning of the parable. 

2. Giles (71). A true-false test with twenty-five moral judg- 
ment questions. Revised by Hanson (76). 

3. Hartshorne and May (82). Have a Recognitions test requir- 
ing the pupil to indicate whether a given act is lying, stealing, 
cheating, etc. 

4. Lowe and Shimberg (108). Used the Binet Fables. 

5. McGrath (129). Used stories and required the pupil to state 
the lesson or moral of each. She also used pictures and required 
the child to identify the moral element in the situation. 

6. Pressey (144). Has a moral judgment test in his intelligence 
test in which the worst word in a list is crossed out. 

7. Schwesinger (159). Has developed a social-ethical vocabulary 
test. 

8. Van Wagenen (176, 177). Used historical incidents and re- 
quired the subject to judge the motive in each act. 
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NOTES AND NEWS 


Dr. Carney Lanopis has been appointed assistant professor of 
psychology at Wesleyan University. Dr. Landis has just completed 
a two year term as National Research Council Fellow. 

Proressor Harvey A. Carr has been made chairman of th 
department of psychology at the University of Chicago. 

Dr. Sipney A. Cook has been promoted to assistant professor of 
psychology at Rutgers University. 

Vernon A. Jones has been appointed associate professor of 
educational psychology at Clark University. 

Proressor Paut THomas Youne, of the University of Illinois, 
has been granted a year’s leave of absence for the purpose of study 
and research at Berlin under a National Research Fellowship in the 
Biological Sciences. 

Dr. Witt1aM H. BurnuHAM«, for twenty years head of the depart 
ment of pedagogy and the school of hygiene at Clark University, 
retired from his professorship at the end of the university year, 


EDITORIAL NOTICE 


Plans have been completed to start a new journal called 
PsycHoLocicaL Agsstracts on January 1, 1927. This journal 
which will probably be edited by Professor Walter S. Hunter of 
Clark University, will be published by the American Psychological 
Association. An effort will be made to abstract all titles included m 
the Psychological Index. Although foreign cooperation has been 
obtained all of the abstracts will appear in English only. As a result 
of the starting of PsycHotocicaL ApsTtracts and in conformity with 
a vote of the American Psychological Association at the Ithaca meet 
ing, the PsycHoLocicAL BuLLETIN will revert to its original form 
after January 1, 1927, and will attempt to more completely cover the 


field of psychology by general reviews and summaries, discussions 
and special reviews of books. 
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